feat: Switch DB to postgres #10

bjchambers · 2024-01-23T21:46:00Z

This uses asyncpg directly.

This removes the tags from the methods, and changes the names to be more descriptive. The intent is to have a single service generated from the OpenAPI specification that includes methods like retrieve_chunks rather than chunks.retrieve(...).

This uses asyncpg directly. This removes the tags from the methods, and changes the names to be more descriptive. The intent is to have a single `service` generated from the OpenAPI specification that includes methods like `retrieve_chunks` rather than `chunks.retrieve(...)`.

bjchambers · 2024-01-23T21:46:19Z

app/chunks/models.py

@@ -27,26 +27,26 @@ class RetrieveRequest(BaseModel):
    """Whether to include a generated summary."""


-class BaseStatement(BaseModel):
+class BaseChunk(BaseModel):


attempting to s/Statements/Chunk

bjchambers · 2024-01-23T21:47:40Z

app/chunks/router.py



 @router.post("/retrieve")
-async def retrieve(store: StoreDep, request: RetrieveRequest) -> RetrieveResponse:
-    """Retrieve statements based on a given query."""
+async def retrieve_chunks(store: StoreDep, request: RetrieveRequest) -> RetrieveResponse:


Resource Naming: I think it makes sense for this to be /chunks/retrieve since it operates on chunks (and conceivably we could have other methods, like list chunks, etc.)

Method Naming: I can see retrieve (if this is the only retireve method, so we have service.retrieve(...)) but called it retrieve_chunks for consistency with other things that name the resource. Thoughts?

retrieve_chunk seems fine - it's explicit about what's being returned which may be good.

It will be easier (in the UI) to treat this as a GET with query params until we have enough query complexity to warrant a full-on post body - it would also simplify the API a bit, allowing this to be treated as just the list view over chunks. I don't want to overly emphasize the FE's needs though.

bjchambers · 2024-01-23T21:48:25Z

app/collections/models.py

+collection_validator = TypeAdapter(Collection)
+
+
+class CollectionCreate(BaseModel):


Not sure how we want to standardize naming. Basically,I have:

Collection the model that we generally return.

CollectionCreate (for creation). Could also do CreateCollection or CreateCollectionInput, CreateCollectionRequest, etc.

It looks like Litestar at least goes with <Resource> (for the result), <Resource>Create (for the creation request), and <Resource>Update for update requests:

class Author(BaseModel): id: UUID | None name: str dob: date | None = None class AuthorCreate(BaseModel): name: str dob: date | None = None class AuthorUpdate(BaseModel): name: str | None = None dob: date | None = None

I usually ended up with something like CreateCollectionRequest and CreateCollectionResponse in the gRPC days, but maybe that's more verbose than is necessary in this case.

bjchambers · 2024-01-23T21:49:20Z

app/collections/router.py

-        session.refresh(collection)
-        return collection
+    result = await conn.fetchrow("""
+        INSERT INTO collection (name) VALUES ($1)


I think asyncpg prepares statements using an LRU cache and the hash of the query. We could (maybe) manually manage that for more efficiency... but should be unnecessary -- we'll likely only have one or two queries per request, so the hash shouldn't be too bad.

bjchambers · 2024-01-23T21:51:25Z

app/main.py

-        "store": Store(),
-        "engine": engine,
-    }
+    # if settings.APPLY_MIGRATIONS:


I tried to get yolo to run in the application, but it seems to only support older DB versions. I suspect our best path here is to just migrate ourselves, at least until we do migrations outside of docker compose. We could just have a loop over migration files or smoething if we really want. For now, did somethnig super stupid.

See https://gist.github.com/mattbillenstein/270a4d44cbdcb181ac2ed58526ae137d for an example script we could use.

Yeah I say KISS

bjchambers · 2024-01-23T21:51:53Z

docker-compose.yml

+      - db:/var/lib/postgresql
+    networks:
+      - kb-network
+    healthcheck:


This causes things that depend on postgres to wait until it reports ready. I'd like to run migrations as part of that, but that is messy.

bjchambers · 2024-01-23T21:52:17Z

migrations/0001_schema.sql

+
+CREATE TABLE collection (
+    id SERIAL NOT NULL,
+    name VARCHAR NOT NULL,


At some point soon, would like to add a tenant table, and tenant_id to the collection.

bjchambers · 2024-01-23T21:52:52Z

migrations/0001_schema.sql

+    FOREIGN KEY(document_id) REFERENCES document (id)
+);
+
+-- CREATE TABLE ingestion(


I'd like to split ingestions out. This could make it possible to add a new ingestion (eg., reconfigure things, and re-ingest) to existing documents, as well as supports changes to documents.

bjchambers · 2024-01-23T21:53:43Z

app/documents/router.py

 from app.ingest.extract import extract
 from app.ingest.extract.source import ExtractSource
 from app.ingest.store import Store, StoreDep

-router = APIRouter(tags=["documents"], prefix="/documents")
+# TODO: Move this to `/documents`. Will require figuring out


See this TODO. I'd like to flatten it, but that is tricky for the list_documents case (which collection do we list). Also, need to make sure we can do file upload with a JSON body in the add_document. So, deferring to a separate PR.

kerinin · 2024-01-23T22:11:48Z

app/chunks/router.py



 @router.post("/retrieve")
-async def retrieve(store: StoreDep, request: RetrieveRequest) -> RetrieveResponse:
-    """Retrieve statements based on a given query."""
+async def retrieve_chunks(store: StoreDep, request: RetrieveRequest) -> RetrieveResponse:


retrieve_chunk seems fine - it's explicit about what's being returned which may be good.

It will be easier (in the UI) to treat this as a GET with query params until we have enough query complexity to warrant a full-on post body - it would also simplify the API a bit, allowing this to be treated as just the list view over chunks. I don't want to overly emphasize the FE's needs though.

kerinin · 2024-01-23T22:13:04Z

app/collections/models.py

+collection_validator = TypeAdapter(Collection)
+
+
+class CollectionCreate(BaseModel):


I usually ended up with something like CreateCollectionRequest and CreateCollectionResponse in the gRPC days, but maybe that's more verbose than is necessary in this case.

kerinin · 2024-01-23T22:15:46Z

app/main.py

-        "store": Store(),
-        "engine": engine,
-    }
+    # if settings.APPLY_MIGRATIONS:


Yeah I say KISS

bjchambers requested a review from kerinin January 23, 2024 21:46

bjchambers commented Jan 23, 2024

View reviewed changes

kerinin approved these changes Jan 23, 2024

View reviewed changes

bjchambers merged commit 3277785 into main Jan 23, 2024

bjchambers deleted the postgres branch January 26, 2024 22:34

bjchambers added the enhancement New feature or request label Jan 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Switch DB to postgres #10

feat: Switch DB to postgres #10

bjchambers commented Jan 23, 2024

bjchambers Jan 23, 2024

bjchambers Jan 23, 2024

kerinin Jan 23, 2024

bjchambers Jan 23, 2024

bjchambers Jan 23, 2024

kerinin Jan 23, 2024

bjchambers Jan 23, 2024

bjchambers Jan 23, 2024

kerinin Jan 23, 2024

bjchambers Jan 23, 2024

bjchambers Jan 23, 2024

bjchambers Jan 23, 2024

bjchambers Jan 23, 2024

kerinin Jan 23, 2024

kerinin Jan 23, 2024

kerinin Jan 23, 2024

		collection_validator = TypeAdapter(Collection)


		class CollectionCreate(BaseModel):

feat: Switch DB to postgres #10

feat: Switch DB to postgres #10

Conversation

bjchambers commented Jan 23, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment